205 research outputs found

    Speedup bioinformatics applications on multicore-based processor using vectorizing and multithreading strategies

    Get PDF
    Many computational intensive bioinformatics software, such as multiple sequence alignment, population structure analysis, etc., written in C/C++ are not multicore-aware. A multicore processor is an emerging CPU technology that combines two or more independent processors into a single package. The Single Instruction Multiple Data-stream (SIMD) paradigm is heavily utilized in this class of processors. Nevertheless, most popular compilers including Microsoft Visual C/C++ 6.0, x86 gnu C-compiler gcc do not automatically create SIMD code which can fully utilize the advancement of these processors. To harness the power of the new multicore architecture certain compiler techniques must be considered. This paper presents a generic compiling strategy to assist the compiler in improving the performance of bioinformatics applications written in C/C++. The proposed framework contains 2 main steps: multithreading and vectorizing strategies. After following the strategies, the application can achieve higher speedup by taking the advantage of multicore architecture technology. Due to the extremely fast interconnection networking among multiple cores, it is suggested that the proposed optimization could be more appropriate than making use of parallelization on a small cluster computer which has larger network latency and lower bandwidth

    Prediction of avian influenza A binding preference to human receptor using conformational analysis of receptor bound to hemagglutinin

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It is known that the highly pathogenic avian influenza A virus H5N1 binds strongly and with high specificity to the avian-type receptor by its hemagglutinin surface protein. This specificity is normally a barrier to viral transmission from birds to humans. However, strains may emerge with mutated hemagglutinin, potentially changing the receptor binding preference from avian to human-type. This hypothesis has been proven correct, since viral isolates from Vietnam and Thailand have been found which have increased selectivity toward the human cell receptor. The change in binding preference is due to mutation, which can be computationally modelled. The aim of this study is to further explore whether computational simulation could be used as a prediction tool for host type selectivity in emerging variants.</p> <p>Results</p> <p>Molecular dynamics simulation was employed to study the interactions between receptor models and hemagglutinin proteins from H5N1 strains A/Duck/Singapore/3/97, mutated A/Duck/Singapore/3/97 (Q222L, G224S, Q222L/G224S), A/Thailand/1(KAN-1)/2004, and mutated A/Thailand/1(KAN-1)/2004 (L129V/A134V). The avian receptor was represented by Siaα(2,3)Gal substructure and human receptor by Siaα(2,6)Gal. The glycoside binding conformation was monitored throughout the simulations since high selectivity toward a particular host occurs when the sialoside bound with the near-optimized conformation.</p> <p>Conclusion</p> <p>The simulation results showed all hemagglutinin proteins used the same set of amino acid residues to bind with the glycoside; however, some mutations alter linkage preferences. Preference toward human-type receptors is associated with a positive torsion angle, while avian-type receptor preference is associated with a negative torsion angle. According to the conformation analysis of the bound receptors, we could predict the relative selectivity in accordance with <it>in vitro </it>experimental data when disaccharides receptor analogs were used.</p

    WASP: a Web-based Allele-Specific PCR assay designing tool for detecting SNPs and mutations

    Get PDF
    BACKGROUND: Allele-specific (AS) Polymerase Chain Reaction is a convenient and inexpensive method for genotyping Single Nucleotide Polymorphisms (SNPs) and mutations. It is applied in many recent studies including population genetics, molecular genetics and pharmacogenomics. Using known AS primer design tools to create primers leads to cumbersome process to inexperience users since information about SNP/mutation must be acquired from public databases prior to the design. Furthermore, most of these tools do not offer the mismatch enhancement to designed primers. The available web applications do not provide user-friendly graphical input interface and intuitive visualization of their primer results. RESULTS: This work presents a web-based AS primer design application called WASP. This tool can efficiently design AS primers for human SNPs as well as mutations. To assist scientists with collecting necessary information about target polymorphisms, this tool provides a local SNP database containing over 10 million SNPs of various populations from public domain databases, namely NCBI dbSNP, HapMap and JSNP respectively. This database is tightly integrated with the tool so that users can perform the design for existing SNPs without going off the site. To guarantee specificity of AS primers, the proposed system incorporates a primer specificity enhancement technique widely used in experiment protocol. In particular, WASP makes use of different destabilizing effects by introducing one deliberate 'mismatch' at the penultimate (second to last of the 3'-end) base of AS primers to improve the resulting AS primers. Furthermore, WASP offers graphical user interface through scalable vector graphic (SVG) draw that allow users to select SNPs and graphically visualize designed primers and their conditions. CONCLUSION: WASP offers a tool for designing AS primers for both SNPs and mutations. By integrating the database for known SNPs (using gene ID or rs number), this tool facilitates the awkward process of getting flanking sequences and other related information from public SNP databases. It takes into account the underlying destabilizing effect to ensure the effectiveness of designed primers. With user-friendly SVG interface, WASP intuitively presents resulting designed primers, which assist users to export or to make further adjustment to the design. This software can be freely accessed at http://bioinfo.biotec.or.th/WASP

    Study of large and highly stratified population datasets by combining iterative pruning principal component analysis and structure

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The ever increasing sizes of population genetic datasets pose great challenges for population structure analysis. The Tracy-Widom (TW) statistical test is widely used for detecting structure. However, it has not been adequately investigated whether the TW statistic is susceptible to type I error, especially in large, complex datasets. Non-parametric, Principal Component Analysis (PCA) based methods for resolving structure have been developed which rely on the TW test. Although PCA-based methods can resolve structure, they cannot infer ancestry. Model-based methods are still needed for ancestry analysis, but they are not suitable for large datasets. We propose a new structure analysis framework for large datasets. This includes a new heuristic for detecting structure and incorporation of the structure patterns inferred by a PCA method to complement STRUCTURE analysis.</p> <p>Results</p> <p>A new heuristic called EigenDev for detecting population structure is presented. When tested on simulated data, this heuristic is robust to sample size. In contrast, the TW statistic was found to be susceptible to type I error, especially for large population samples. EigenDev is thus better-suited for analysis of large datasets containing many individuals, in which spurious patterns are likely to exist and could be incorrectly interpreted as population stratification. EigenDev was applied to the iterative pruning PCA (ipPCA) method, which resolves the underlying subpopulations. This subpopulation information was used to supervise STRUCTURE analysis to infer patterns of ancestry at an unprecedented level of resolution. To validate the new approach, a bovine and a large human genetic dataset (3945 individuals) were analyzed. We found new ancestry patterns consistent with the subpopulations resolved by ipPCA.</p> <p>Conclusions</p> <p>The EigenDev heuristic is robust to sampling and is thus superior for detecting structure in large datasets. The application of EigenDev to the ipPCA algorithm improves the estimation of the number of subpopulations and the individual assignment accuracy, especially for very large and complex datasets. Furthermore, we have demonstrated that the structure resolved by this approach complements parametric analysis, allowing a much more comprehensive account of population structure. The new version of the ipPCA software with EigenDev incorporated can be downloaded from <url>http://www4a.biotec.or.th/GI/tools/ippca</url>.</p

    Iterative pruning PCA improves resolution of highly structured populations

    Get PDF
    BACKGROUND: Non-random patterns of genetic variation exist among individuals in a population owing to a variety of evolutionary factors. Therefore, populations are structured into genetically distinct subpopulations. As genotypic datasets become ever larger, it is increasingly difficult to correctly estimate the number of subpopulations and assign individuals to them. The computationally efficient non-parametric, chiefly Principal Components Analysis (PCA)-based methods are thus becoming increasingly relied upon for population structure analysis. Current PCA-based methods can accurately detect structure; however, the accuracy in resolving subpopulations and assigning individuals to them is wanting. When subpopulations are closely related to one another, they overlap in PCA space and appear as a conglomerate. This problem is exacerbated when some subpopulations in the dataset are genetically far removed from others. We propose a novel PCA-based framework which addresses this shortcoming. RESULTS: A novel population structure analysis algorithm called iterative pruning PCA (ipPCA) was developed which assigns individuals to subpopulations and infers the total number of subpopulations present. Genotypic data from simulated and real population datasets with different degrees of structure were analyzed. For datasets with simple structures, the subpopulation assignments of individuals made by ipPCA were largely consistent with the STRUCTURE, BAPS and AWclust algorithms. On the other hand, highly structured populations containing many closely related subpopulations could be accurately resolved only by ipPCA, and not by other methods. CONCLUSION: The algorithm is computationally efficient and not constrained by the dataset complexity. This systematic subpopulation assignment approach removes the need for prior population labels, which could be advantageous when cryptic stratification is encountered in datasets containing individuals otherwise assumed to belong to a homogenous population
    • …
    corecore